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In  this  reporting  period  we  have  accomplished  the  following: 

i. .  Developed  a  procedure  for  totally  self-checking  (TSC)  checker  design  for  m- 
out-of-2m  codes  at  the  transistor  level. 

ii.  Derived  a  technique  for  designing  TSC  fault-tolerant  systems. 
i.  TSC  checker  for  m-out-of-2m  codes. 

The  w-out-of-2w  (ml 2m)  codes  are  special  cases  of  m-out-of-w  codes  which  are 
useful  for  detecting  single  bit  and  unidirectional  multibit  errors  in  information  bits.  The 
direct  mapping  of  a  gate-level  TSC  checker  to  its  transistor-level  equivalent,  cannot 
guarantee  TSC  property.  This  is  because  not  all  faults  at  the  transistor-level  can  be 
modeled  as  stuck-at  faults,  which  are  commonly  assumed  at  the  gate  level.  We  have 
developed  an  approach  for  implementing  checkers  for  m/2m  codes  at  the  transistor-level 
which  are  TSC  with  respect  to  the  following  faults: 

a) ,  single  stuck-at  faults  at  input  and  output  signal  lines; 

b) .  stuck-on  and  stuck-open  transistor  faults; 

c) .  bridges  between  input  signal  lines; 

d) .  breaks  in  input  signal  lines. 

e) .  bridges  in  source-drain  (SD),  gate-source  (GS)  and  gate-drain  (GD)  of 
transistors. 

We  first  propose  a  TSC  checker  for  2/4  code,  which  is  designed  by  replacing  the  NAND, 
NOR  gates  in  a  gate-level  2/4  TSC  checker  with  new  circuit  structures  as  shown  in  Fig  1, 
instead  of  using  traditional  CMOS  implementation  of  NAND,  NOR  gates.  Fig.  2  shows  the 
transistor-level  implementation  of  TSC  checker  for  2/4  code. 


Theorem'.  The  checker  circuit  of  Fig.2  is  TSC  for  the  faults  assumed  in  the  last  section. 

For  the  sake  of  brevity ,  the  proof  of  the  theorem  is  not  included. 

The  2/4  checker  is  used  as  a  building  block  for  constructing  checkers  for  several  other 
m/2m  codes.  A  general  procedure  for  designing  checkers  for  m/2m  codes  where  m=3,4  5 
and  6  is  presented  . 

TSC  Checker  Design  for  m-out-of-2/n  Codes  (m= 3,  4,  5,  6): 

The  procedure  for  TSC  checker  design  consists  of  the  following  steps: 

Step  1: 

Case  1.  m  is  odd  (m  =  3,  5) 

i) .  Partition  inputs  {xj  ...  X2m\  into  two  blocks  A  and  B  such  that  block  A  has  m+ 1  input 
variables  andblockB  has/w-1  variables,  i.e.,  A  ={xy  ...  xm+j),  B ={xm+2  ■■■  x2m }■ 

ii) .  Connect  the  input  variables  in  block  A  to  a  TSC  checker  for  — — — out-of-(/w+l)  code 

and  identify  its  output  as  (z\Z2)\]  For  m  —  5,  invert  the  input  variables  in  block  B  and 
connect  these  to  a  TSC  2/4  checker,  identify  its  output  as  (zjZ2)b  For  m  =  3, 

(Z1Z2)B  =x5x6 

iii) .  Connect  (zjz2)a  anc}  (ziz2)B  t0  TSC  2-out-of-4  checker. 

Case  2.  m  is  even  (m  =  4,  6) 

i) .  Partition  inputs  {xj  ...  X2m}  into  two  blocks  A  and  B,  each  block  has  m  input 
variables,  i.e.,  A={xj  ...  xm },  B ={xm+j  ...x2m)- 

ii) .  Connect  input  variables  in  block  A  to  a  ^-out-of-w  checker,  and  mark  its  output  as 
(Z1Z2)A 


iii) .  Connect  input  variables  in  block  B  to  a  —  -out-of-w  checker,  and  mark  its  output  as 
(Z1Z2)B. 

iv) .  Connect  (z\Z2)A  and  (Z1Z2)B  t0  t^ie  TSC  2-out-of-4  checker. 

Step  2: 

i) .  Partition  inputs  {xj  ...  X2m}  into  two  blocks  Aj  and  Bj.  For  m  —  4,  6,  each  block  has  m 
inputs,  i.e.,  Aj  =  {xj  ...  xm),  Bj  =  {xm+j  ...  *2m}  For  m  ~  T  block  Aj  has  m+ 1 
elements  and  block  Bj  has  m- 1  inputs,  i.e.,  Aj  =  {xj  ...  xm+j},  Bj  =  {xm+2  ■■■  x2m }• 

ii) .  For  m  =  3,4,  partition  block  Aj  into  two  blocks  aj  and  each  having  2  input 
variables,  i.e.,  a  j=  (xj  X2 }  and  a2=  {X3  X4}.  For  m  =  4,  partition  block  Bj  into  two  blocks 
b  j  and  b2  such  that  bj=  {X5  xg}  and  b2"  {^7  xg).  Let  block  Ajj=  {a2  Bj}  and  Bjj=  a} 
for  m  =  3,  and  let  block  Ajj=  (a2  b2)  and  Bjj=  {aj  b 3 }  for  m  =  4.  Connect  blocks  Aj  and 
Bj  to  a  checker  block  I  designed  by  Step  1,  identify  its  output  as  (zjz2)l  Connect  blocks 
An  and  Bn  to  a  checker  block  II  designed  by  Step  1,  identify  its  output  as  (zjZ2)u. 

iii) .  For  m  =  5,  partition  block  Aj  into  three  blocks  aj,  a2  and  33  such  that  aj=  {xj  X2}, 
a2~(*3  *4}  and  a3=  {X5  X5},  and  partition  block  Bj  into  two  blocks  bj  and  b2  such  that 
bj=  {x7  x8}  and  b2={*9  xjq).  For  m  =  6,  partition  block  Aj  into  two  blocks  aj  and  a2 
such  that  a|=  {xj  X2  X3}  and  a2=  {X4  X5  xg},  and  partition  block  Bj  into  two  blocks  b j 

and  b2  such  that  b j=  {X7  x8  X9}  and  b2=  {xjo  x\  1  x12)  ^or  m  ~  ^  -^II-  (a2  a3 

b2},  Bu=  {aj  bj},  AIXI=  {a3  bj  b2}  and  Bm=  {aj  a2).  For  m  =  6,  let  block  Ajj=  {a2 
b2},  Bjj=  {aj  bj},  Ajjj=  {aj  b2}  and  Bjn=  {a2  bj}.  Connect  blocks  Aj  and  Bj  to  a 
checker  block  I  designed  by  Step  1,  identify  its  output  as  (zjZ2)j-  Connect  blocks  Ajj  and 
Bjj  to  a  checker  block  II  designed  by  Step  1,  identify  its  output  as  (zjZ2)jj.  Connect 
blocks  Ajjj  and  Bjjj  to  a  checker  block  III  designed  by  Step  1,  identify  its  output  as 

(zlz2)lII- 

iv) .  For  m  =  3,  4,  connect  (zjZ2)j  and  (zjZ2)jj  to  a  TSC  Two-rail  checker  (TRC)  to 
produce  the  checker's  final  output. 

v) .  For  m  =  5,  6,  connect  (zjZ2)nj  and  (z2Zj)j  to  a  TSC  TRC  that  produces  outputs  Aj 
and  A2-  Connect  (zjZ2)i  and  (zjz2)n  to  a  TSC  TRC  to  produce  outputs  Bj  and  B2. 


Finally,  connect  A],  A2,  B]  and  B2  to  a  TSC  TRC  to  produce  the  final  output. 

The  symbol  of  the  TSC  TRC  and  its  input-output  pattern  are  shown  in  Figure  3.  In  the 
following  discussion,  n  represents  the  number  of  Is  at  the  checker's  input,  is  the 

number  of  Is  at  input  block  A  and  is  the  number  of  Is  at  input  block  B  and  so  on. 

TSC  Checker  Design  for  3/6  and  4/8  codes 

We  illustrate  the  above  procedure  by  designing  TSC  checkers  for  3/6  code  (m=3),  and 
4/8  code  (m=4).  The  checker  blocks  designed  by  following  step  1  of  the  general 
procedure,  are  shown  in  Figure  4(a)  and  (b)  for  m  =  3,  and  m  =  4  respectively. 

The  two  separate  partitions  on  the  input  variables  generated  from  step  2  are: 

For  m  =  3, 

i)  A!  =  { x \  x2  *3  *4}  and  Bj  =  {x5  x6}; 

ii)  Ajj  =  {x3  x4  x5  x6}  and  Bn  =  {x\  x2}. 

For  m  =4, 

i) .  A\  =  {xj  X2  x3  x4)  and  Bj  =  {X5  xg  xj  xg}; 

ii) .  Ajj  =  (x3  x4  xj  xg}  and  Bn  =  {xj  x2  X5  x6) . 

Inputs  belonging  to  a  partition  are  connected  to  checker  blocks  as  shown  in  Fig.  5(a)  and 
5(b)  respectively.  The  outputs  of  the  checker  blocks  corresponding  to  partitions  I  and  II 
are  identified  by  (z\z2)i  and  (ziZ2)n  respectively.  These  outputs  feed  a  TSC  TRC  to 
produce  the  final  output  of  the  3/6  and  4/8  checker.  As  shown  in  Tables  1  and  2  both  of 
these  checkers  satisfy  the  code-disjoint  property. 

The  TSC  checkers  for  5/10  code  and  6/12  code  are  designed  in  a  similar  manner,  and  are 
shown  in  Fig.  6  and  Fig. 7  respectively. 


ii  TSC  Fault  tolerant  System  Design : 


This  work  has  resulted  in  a  paper  which  has  been  accepted  for  conference  presentation 
and  publication  in  the  Proceedings  of  the  Second  International  Conference  on  Reliability 
and  Quality  in  Design  (RQD’95  Proceedings).  A  copy  of  the  paper  is  attached. 
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Fig.  1.  CMOS  implementation  of  NAND,  NOR  functions 
(a).  NOR  circuit,  (b).  NAND  circuit. 
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Figure  3.  TSC  two-rail  checker  (TRC) 


(b) 

Figure  4.  (a)  6-input  checker  block;  (b)  8-input  checker  block 
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Figure  5.  (a)  TSC  checker  for  3/6;  (b)  TSC  checker  for  4/8  code 


Table  1.  Code  disjoint  property  of  the  3/6  checker. 


3/6  checker  inputs 

TRC  inputs 

3/6  checker  outputs 

n 

"a,  «b, 

"ah  "b„ 

(Z1Z2>I 

(z  1  Z2>II 

zlz2 

codewords 

3 

1  2 

3  0 

01 

01 

10 

2  1 

01 

10 

01 

2  1 

■Ati.uia 

10 

01 

01 

2  1 

10 

10 

10 

3  0 

2  1 

01 

10 

01 

1  2 

01 

01 

10 

non-code 

0 

00 

00 

00 

00 

11 

words 

1 

0  1,  or  1  0 

B1W 

00 

00 

11 

2 

02 

20 

01 

00 

11 

20 

2  0,  or  1  1 

00 

00 

02 

00 

01 

1  1 

2  0,  or  1  1 

00 

00 

4 

40 

22 

01 

11 

00 

3  1,  or  2  2 

11 

11 

3  1 

3  Lor  2  2 

11 

11 

22 

40 

11 

01 

5 

3  2 

3  2.  or  4  1 

11 

11 

00 

4  1 

3  2 

11 

11 

6 

42 

42 

11 

11 

0& 

Table  2.  Code  disjoint  property  of  the  TSC  checker  for  4/8  code. 


4/8  checker  inputs  _______  TRC  inputs  4/8  checker  outputs 


Fig.6  Totally  Self-Checking  checker  for  5-out-of- 1 0  code 


Fig.  7  Totally  Self-Checking  checker  for  6-out-of-12  code 
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Abstract 

A  scheme  for  designing  fault  tolerant  systems 
which  are  also  totally  self-checking  for  all  single 
faults,  is  presented  in  this  paper.  The  system  will 
provide  correct  output  in  the  presence  of  a  single 
faulty  element  and  identify  the  element  as  well.  The 
scheme  also  allows  distinction  between  a  permanent 
fault  and  a  transient/intermittent  fault. 


1.  Introduction 

One  of  the  established  methods  of  designing 
a  reliable  system  from  less  reliable  components  is  the 
TMR(Triple  Modular  Redundancy)  technique!  1],  The 
output  of  a  TMR  system  is  the  majority  of  three 
identical  components.  Thus,  such  a  system  can 
tolerate  errors  in  any  one  component.  The  major 
drawback  of  the  TMR  system  is  that  if  two  modules 
fail  or  the  voter  has  a  fault  which  cannot  be  masked, 
then  the  system  produces  erroneous  outputs  without 
giving  any  indication  of  failure.  Two  approaches  have 
been  proposed  to  overcome  this  problem[2,3].  Both 
approaches  incoporate  error-checking  circuits  to 
detect  erroneous  outputs.  However,  both  approaches 
suffer  from  the  disadvantage  that  the  additional 
circuitry  is  not  self-checking.  Recently,  another 
approach  has  been  proposed  to  implement  totally  self¬ 
checking  TMR  fault-tolerant  systems[4].  This 
approach  allows  detection  of  both  unmasked  and 
masked  faults;  however,  no  distinction  is  made 
between  a  permanent  and  a  transient/intermittent 
fault.  Moreover,  the  circuit  overhead  is  high. 

2.  Fault  tolerant  implementation 

We  propose  a  new  scheme  for  fault-tolerant 
system  design  which  also  makes  the  system  totally 
self-checking;  the  concept  of  self-checking  design  has 
been  discussed  in  [5]. In  the  proposed  scheme,  the 


simplex(non-redundant)  circuit  is  replaced  by  three 
identical  copies  X,Y  and  Z,  as  shown  in  Fig.l.  As  in 
the  TMR  system,  all  three  copies  receive  the  same 
input.  The  output  of  each  module  is  compared  with 
the  outputs  of  the  remaining  two.  The  outputs  of  the 
comparators  are  identified  as  a,b  and  c.  If  a=0,  the 
outputs  of  modules  X  and  Y  match,  whereas  a=l 
indicates  a  mismatch  between  the  outputs  of  the  two 
modules.  Similarly,  b  and  c  indicate  the  compared 
values  of  modules  X/Z  and  Y/Z  respectively.  It  would 
be  clear  that  if  a  module  produces  faulty  output,  the 
outputs  of  two  comparators  will  be  at  1  i.e  the  outputs 
of  the  comparators  will  form  a  2-out-of-3  code.  On 
the  other  hand,  if  one  comparator  is  faulty  the  values 
of  a,b  and  c  will  constitute  a  l-out-of-3  code.  Table  1 
shows  how  a  faulty  component  can  be  identified  from 
the  values  of  a,b,  c.  If  one  of  the  comparators  is 
faulty,  the  single  module  fault  assumption  is  no  longer 
valid,  hence  no  corrective  action  can  be  taken.  Also, 
if  abc=lll,  at  least  two  modules  are  faulty,  the 
corrective  action  is  not  activated. 

The  function  of  the  decision  and  correction  logic 
in  Fig.l  is  to  reconfigure  the  system  so  that  the 
system  output  is  derived  from  a  fault-free  module.  As 
mentioned  previously,  depending  on  the  outputs  of  the 
comparators  a,  b  and  c,  one  module  is  selected  to 
provide  the  correct  output.  The  decision  logic  consists 
of  an  encoder,  a  totally  self-checking  checker  and 
circuitry  for  enabling  the  tri-state  buffers.  The 
encoder  accepts  the  outputs  of  the  comparators  and 
converts  them  into  a  2-out-of-4  code  as  shown  in 
Table  2.  A  totally  self-checking  2-out-of-4  checker  is 
placed  at  the  output  of  the  encoder  circuit  to  check  the 
validity  of  the  codeword. 

Finally,  the  output  of  the  system  is  derived 
by  enabling  one  of  the  buffers  as  indicated  in  Table  1. 
If  a  comparator  is  faulty,  or  two  or  more  modules  are 
faulty  i.e.  lmnp  =  -00-  or  -11-,  a  flag  is  generated 
and  all  the  modules  are  disconnected  from  the  output 
bus,  thus  preventing  the  propagation  of  erroneous 


information.  If  there  is  no  fault,  the  output  of  the 
enable  circuit  will  form  a  l-out-of-4  code.  A  checker 
circuit,  which  is  totally  self-checking  for  single  and 
unidirectional  multiple  errors,  is  placed  at  the  output 
of  the  enable  circuit.  The  checker  will  produce  a  1- 
out-of-2  code  if  it  receives  a  l-out-of-4  code,  and  if 
there  is  no  fault  in  the  circuit  itself.  If  the  checker 
produces  00  or  11  output,  the  system  needs  repair. 

The  function  of  the  retry  circuit  (Fig.2)  is  to 
distinguish  between  a  permanent  fault  and  a 
transient/intermittent  fault.  It  is  assumed  that  the 
duration  of  a  transient  fault  is  less  than  two  clock 
periods.  Once  a  fault  is  detected  i.  e.  abc  *  000,  it  is 
checked  whether  the  fault  is  of  transient  or  permanent 
nature.  This  is  accomplished  by  clocking  in  the  values 
of  abc  in  a  3-bit  register.  If  abc  ±  000,  there  is  a 
faulty  component,  the  error  signal  will  go  to  1,  and 
the  contents  of  the  register  will  feed  the  AND  gate 
inputs  via  the  multiplexers.  If  the  next  set  of  values  of 
abc  is  exactly  the  same  as  that  stored  in  the  register, 
the  corresponding  fault  is  assumed  to  be  of  permanent 
nature,  whereas  abc  =  000  will  indicate  that  the  fault 
is  of  transient/intermittent  nature.  If  a  fault  is  found  to 
be  permanent,  it  can  be  diagnosed  to  a  replaceable 
component  which  is  identified  by  the  contents  of  the 
3-bit  register.  For  example,  if  module  X  has  a 
permanent  fault,  the  contents  of  the  register  for  two 
consecutive  pulses  will  be  110  (as  indicated  in  Table 
1).  The  retry  circuit  can  be  tested  off-line  for  single 
faults 

3.  Conclusion 

A  scheme  for  improving  the  reliabilty  of  digital 
systems  by  incorporating  fault  tolerance  and  self¬ 
checking  concepts  is  presented  in  this  paper.  The 
major  advantage  of  this  scheme  is  that  it  will  not  only 
provide  correct  output  in  the  presence  of  a  single 
faulty  element,  comparator  or  functional  module,  but 
will  identify  the  faulty  element  as  well.  In  addition, 
the  scheme  allows  distinction  between  a  permanent 
and  a  transient  fault  in  an  element;  The  system  is 
implemented  in  a  modular  fashion,  and  each  module 
is  protected  by  a  self-checking  checker.  In  the  event 
of  a  fault,  the  system  will  indicate  its  presence  on¬ 
line.  If  there  is  a  fault  combination  whose  effect 
cannot  be  corrected,  the  output  bus  is  disconnected 
from  the  system,  thus  preventing  the  propagation  of 
erroneous  information  to  other  systems  which  may  be 
connected  to  the  faulty  system. 
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component 
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source 

0  0  0 
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module  X  (or  Y  or  Z) 
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0  1  1 

module  Z 

module  Y  (or  X) 

1  0  0 

comparator  a 
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1  0  1 

module  Y 

module  Z 

1  1  0 

module  X 

module  Z 

Table  1  Reconfiguration  of  faulty  modules 
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Table  2  Encoding  of  3 -bit  binary  patterns 
using  2-out-of-4  code 
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Table  3  l-out-of-4  to  l-out-of-2  conversion. 
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Fig.  1  Fault  tolerant  system  block  diagram 


